ASYMPTOTIC EXPONENTIALITY OF THE DISTRIBUTION OF FIRST 
EXIT TIMES FOR A CLASS OF MARKOV PROCESSES WITH 
APPLICATIONS TO QUICKEST CHANGE DETECTION 



MOSHE POLLAK 



Alexander G. Tartakovsky 



The Hebrew University of Jerusalem 
Department of Statistics 

Mount Scopus 
Jerusalem 91905, Israel 
msmp@mscc.huji.ac.il 



University of Southern California 

Department of Mathematics 
3620 S. Vermont Ave, KAP-108 
Los Angeles, CA 90089-2532, USA 



tartakov@usc.edu 



Submitted to Probability Theory and Its Applications, March 2007 



Abstract 



We consider the first exit time of a nonnegative Harris-recurrent Markov process from 
the interval [0, A] as A oo. We provide an alternative method of proof of asymptotic 
exponentiality of the first exit time (suitably standardized) that does not rely on embedding in 
a regeneration process. We show that under certain conditions the moment generating function 
of a suitably standardized version of the first exit time converges to that of Exponential 1), and 
we connect between the standardizing constant and the quasi-stationary distribution (assuming 
it exists). The results are applied to the evaluation of a distribution of run length to false alarm 
in change-point detection problems. 

Keywords and Phrases: Markov Process, Stationary Distribution, Quasi-stationary Distri- 
bution, First Exit Time, Asymptotic Exponentiality, Change-point Problems, CUSUM Proce- 
dures, Shiryaev-Roberts Procedures. 

1. Introduction 

Let (i7,jF, P) be a probability space and {X(n)}, n = 0, 1,2, . . . be a discrete-time non- 
negative Harris-recurrent Markov process defined on this space. The limiting distribution as A ^ 
oo of the suitably standardized first exit time of the process from the interval [0, A] turns out often 
to be exponential. 

The standard method for proving this asymptotic exponentiality is to try to find a version of the 
process that is regenerative (cf. Glasserman and Kou, 1995 and Asmussen, 2003). The heuristic 
behind this is that since the process is Harris-recurrent, it returns to a given set over and over 
again, and thus creates "cycles" that are "almost independent." Hence, the first cycle in which 
X(n) exceeds A is approximately geometrically distributed, and if the expected length of a cycle 
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is finite and the probability of exceeding A in a given cycle tends to as A ^ oo, then, suitably 
standardized, the asymptotic distribution of the first exit time is exponential. 

In this paper, we make a connection between the standardization constant and the quasi- 
stationary distribution. Our method of proof is a coupling argument. Although less general as 
a method for proving asymptotic exponentiality than the regeneration argument, we believe that 
our method is of interest in its own right. This notwithstanding, the regeneration argument seems 
to be widely unknown in the statistics community, and ought to be publicized. 

The paper is organized as follows. In Section [2l we present the main result that states that 
the limiting distribution of the suitably standardized version of the first exit time as A — > oo is 
Exponential 1) and that the moment generating function converges to that of Exponential 1), 
which implies that the convergence is in for all p > 1. The proof is given in Section [31 We 
make a few remarks in Section IH In Section [51 we give examples and describe applications to 
the evaluation of the distribution of the run length to false alarm for several change detection 
procedures. 

2. Main Results 

Let {X(ri)}^Q be a discrete-time Harris -recurrent Markov process with state space [0, oo) and 
stationary transition probabilities. Let P^' denote the probability measure for the process when it 
starts at x (i.e., X(0) = x), and let P*^ denote the probability measure when the initial state is 
distributed according to the distribution G. 

Definition. We call the process stochastically monotone if P^(X(1) > y) is non-decreasing and 
right-continuous in x for all y. 

We will be interested in the behavior of the first exit time of X(n) from the interval [0, A] when 
X{n) starts at x E [0, A), i.e., of the stopping time 

NX = mm{n>l:X{n)> A}, X{0) = x, (2.1) 

where < x < A and A is a positive finite threshold, assuming that the Markov process X(n) is 
stochastically monotone and Harris -recurrent. 

The following theorem is the main result of the paper. 

Theorem 1. Let X{n), n = 0, 1, 2, . . . be a stochastically monotone Harris -recurrent Markov 
process with state space [0, oo) and stationary transition probabilities such that: 

CI. The stationary distribution H{y) = lim^^oo {X{n) < y} exists and its support is 
[0,oo). 

C2. The quasi-stationary distribution HA^y) = lim„^oo {X{n) < y\N^ > n} exists for all 
< X < A and for all < A < oo. 
Let Pa = P^-* {X(l) > A}. 
Then: 

(i) The distribution ofpA is asymptotically Exponential(l) as A ^ oo for all fixed x G 
[0,oo). 

(ii) The moment generating function E exp {tpAN^} ofpAN^ converges to 1/(1— t) as A ^ oo 
for all fixed x G [0, oo). In particular, it follows that 

hm pa^N"^ = 1 and lim Variance {p^X^} = 1. 

A— >oo A—fOo 
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Conditions CI and C2 hold in a variety of scenarios. See corresponding remarks in Section |4] 
and examples in Section [51 

We begin with a heuristic argument. A formal proof requires several auxiliary results and is 
given in Section [3l 

Write N^^ for the stopping time when the process X{n) starts at a random point X{0) = ^ 
in [0, A] that has a quasi-stationary distribution Ha, i.e., P(^ < y) = HA{y). Then P^-*(X(n) > 
A\N^^ > n) = Pa for all n > 1, and, therefore, the distribution of A^^'^ is geometric with the 
parameter pa for all A > 0. Further, under conditions CI and C2, the probability pa goes to as 
A ^ oo, which implies that paN^^ converges weakly to Exponential(l) as A — > oo. Intuitively, 
the asymptotic behavior of the stopping time for every fixed point x is similar to that of A^^'' . 
Mathematical details are presented in the next section. 

3. Proof 

In order to prove Theorem[I]we need the following lemmas. We use the notation of the previous 
section, and we assume that the conditions of Theorem [T] are satisfied. 

Lemma 1. The quasi-stationary distribution 

HA{y) = lim {X{n) < y\N^A > n} 

n— >oo 

converges to the stationary distribution H{y) at all continuity points y ofH. 

Proof. Follows directly from Theorem 1 of PoUak and Siegmund (1986). □ 

Recall that A^^'* is the stopping time (|2.1I) when the Markov process X{n) starts from the 
random point that has the quasi- stationary distribution Ha, i.e., X{0) ~ Ha- 

Lemma 2. The distribution of N^-"" is Geometric(p^), where pa = P^"^ {^iX) > Hence 
Pa^N^^ = 1 and paN^"^ converges in distribution to Exponential(l) as A ^ (x. 

Proof. Since the Markov process is Harris-recurrent, there is no absorbing state, so that P(A^^'* = 
oo) = 0. Therefore, the geometric property of N^"^ is obvious. Lemma[I]and the assumption that 
the support of H is [0, oo) guarantee that pa * 0. □ 

Lemma 3. Let X^{n) denote a process that starts from x and has the same transition probabilities 
as X{n). Let < x < y < oo. There exists a sample space with X^{n) and X^(n) such that 
Xy{n) > X'={n)foralln > 1. 

Proof. Clearly X^(l) is stochastically larger than X^{1), so that one can construct a sample space 
where X^(l) > X^{1). To complete the proof, continue by induction on n. □ 

Lemma 4. Let < x < y < oo. Let X^{n) and X^(n) be independent Markov processes started 
at X and y respectively, both having the same transition probabilities as X{n). Then 

P \X''{n) > X^ (n) for at least one value of n \ = 1. (3.1) 
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Proof. Let < £ < 1/4 and y < B < oo be such that H {(5, oo)} = e. Let We be such that 



V \X"{w,)<z \ -H{z) 



and 



By virtue of Lemma [H 



V\X%We) <z \ -H(z] 



< e for all z 



< e for all z. 



< e for all z. 



Write m for the median of the stationary distribution H. Obviously, 

P ({B > X^K) V Xy{we)} \{B> X^K) > m, X^K) < m} 



and 



(i - 25)2 < (1 - 2£)(i ~e)<p[B> X^iwe) > m, X^^) < m} < (i + e)^. 
Similarly, for any j > 2 when u < v 

(i + ef>P {X'^ijwe) > m,xy{jwe) < m|X^((j - IK) = m,X^((j - = t;} 



and 



P ({5 > X^(j^,) VX^(jti;,)} \ {B > X^ijwe) > m,Xy{jw,) < m} 

3 
4 



<(l-^)'-(|-^r = 7-^- 



Let Tb = min |j : X'-'ijw,) V X2'(jwJ > B 
Using previous inequalities, we obtain 



p[b> X^ijwe) > X^Uwe) for some 1 < j < Tb) > (| - 2£)' J] (f - e)' 



i=0 



(1-2^)' 



Letting £ — > completes the proof. 



□ 
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Lemma 5. Using the same notation as in Lemma^ 

P > Xy{£)for some £<n 

uniformly inO<x<y<B. 

Proof. This follows directly from Lemma|4]and its proof. □ 

Lemma 6. Let e > and let < B < oo be such that H {{B, oo)} < e. Let B < A < oo. Then 

HA{iB,A)}<e. 

Proof. The lemma follows from the fact that HA{y) > H{y) for all y > (cf. Theorem 1 of PoUak 
and Siegmund, 1986). □ 

Proof of Theorem [I] (i). Let N"^ = min{n : X(n) > A} where X(0) ~ Ha- By 
Lemma [21 N^"^ ~ Geometric (p a) and 



jim P {paN^^ > s) = s > 0. 



Let e > 0. Let < 5 < oo be such that H {{B, oo)} < e. Using the notation of LemmalU let 
< < c>o be such that 

P (^X°(n) > X^{n) for some n<qB^ >l-e. (3.2) 

By virtue of Lemma[T]and Lemma[2l there exists such that for all A > A^ 

\Ha{x) - H{x)\ <e for all < X < B (3.3) 

and 

|P [paNa^ > s) -e-'\<e for all < s < oo. (3.4) 
Because the support of H is [0, oo), it follows from (|3.3I) that paQb ^ 0. 

Next, we construct the following sample space. Let X^{n) be a Markov process (with transition 
probabilities as X{n)) starting at and let X^{n) be a Markov process starting at B such that they 
are independent until the first time that X°(n) > X^{n). Denote this time by r. After r, let 
X^, X^ be such that X^{n) > X^{n) for all n > r. (This construction is feasible by virtue of 
Lemma [3] and LemmalU) 

By virtue of equation (|3.2I) . P(r < qb) > I — £. Denote 

iV^ = min |n > 1 : X\n) > a| and = min |n > 1 : X^{n) > a| . 

Note that is stochastically larger than X^ if x < y. 
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Now, fix < s < oo and let be large enough so that PaQb < s for all A > As- Then we 
have the following chain of equalities and inequalities: 



P (paN? > s) = P 


PaN? > s 


> P 


PaN? > s, 


= P 


PaNa > s, 


= P 


PaN^ > s, 


> P 


PaN'I > s. 


= P 


^^PaN^ > s,j 


= P 


(paNI > s, 


> P 


{paN'^ > s) 


> P 


(pAN'i > s) 



On the other hand, 



= P {paK >s)-e. 

P {paN^ >s)=P {paN^^ > s\X{0) = B) 
< P [paN^^ > s\X{0) < B) 
_ v\paN^^>s,X{0)<B) 

P(X(0) < B) 
^ P{paN^^>s,X{0)<B) 
HAi[0,B]) 
P {paN^^ > s) 



< 



HAi[0,B]) 



(3.5) 



Since by the definition of B and Lemma[6l Ha{[0,B]) > 1-e, and by equation (Il4l) . P{paN^^ > 
s) < e""* + £, we obtain 



P {paK >s)< 
Also, since P(X(0) > 0) = ^^^([0, A]) = 1, 

P {paK >s)=P {paN^^ > s\X{0) = 0) 

> P {paN^^ > s\X{0) > 0) 
_ P(pAiVj^>g,X(0)>0) 

P(X(0) > 0) 
= P [paN^^ > s) 

> e~' - e. 



(3.6) 



(3.7) 
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where the last inequality follows from equation i3.4\i . 
Putting (1331 ) and (1X71 ) together yields 

P {paN^ >s)> e-' - 2e, (3.8) 

and putting (13.51) and (|3.6I) together obtains 

P{p^N'i>s)<^-^^ + e. (3.9) 

Since for all < x < 5, 

P {paN^ >s)<P [paNI >s)<V [paNI > s) , (3.10) 
equations (Il8l)- (I3.10I) imply that 

e-' -2e<P {paNI > s) < ^__Ll + ^ for all < x < B. 

Finally, fix x and let £ — * 0, so that ultimately B > x. This completes the proof of Theo- 
rem [T](i). 

Proof of Theorem [I](ii). Since N^^ is distributed Geometric (pa), PaN^^ has a moment 
generating function 

M^^{t) = Ee'P^^"^, t<l, 

and it is easy to see that 

M^^{t) fort<l. (3.11) 

A— >oo 1 — t 

Obviously, 

M^^{t) = EE (^e*P-^<''|X(0)) , 

where X(0) has distribution Ha- It follows that for every initial state x > and all t < 1 the value 
of PaNj^ has a moment generating function 

Ml(t) = Ee'P^^^ 

and 

M^^it) = EM^(°)(t) = / M%{t)HA{dx). 

Jo 

For t < 0, by virtue of Theorem [TJi) 

1 



M^(t) 



A^oo 1 - t 



Let < £ < 1 and C > be such that H{[0, C)} = e. For fixed < t < 1, let A{e) > Che 
such that 

M^^ (t) 

1- e < ^ < 1 + e whenever A > A{e). 
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Recall that X{0) has distribution Ha, which is a quasi-stationary distribution. 
For any < 7 < oo, Markov's inequality yields 



P(Mf°)(t)>7M^-(t)) <l/7, 



so that for A>A{6) 

p(M^^'\t)>^]<- 

7 



P(Mr)(t)>^)<i±£. (3.12) 



Substituting 7 = (1 + in (13.121) yields 

Since, by Lemmal e = H{[0, C)} < Ha{[0, C)}, it follows that for M^^°^(t) > ^y^, the 
value of X(0) cannot exceed C. In other words, 

M%{t) < ^ ^ ^ for X > C and all A > A{e). (3.13) 

Let P = min{n : X{n) > C}. Obviously, 

M^(t) = Ee*^^^^ < Ee'P^^ ■ Ee'^^^^ . (3.14) 

Let 4 = P {X^{1) > C}. Clearly 6,^P {X\l) > 0} > as e ^ 0. 
Due to the monotonicity of the process X{n), (3 is bounded by a Geometric(5e)-distributed 
random variable, so that for < t < 1 

1 < Ee*^'*'^ < Eg*PAGcometnc{(5e) "e^ 



1 - (1 - 5e)eP^^ 

It follows that Ee^'P-^l^ is bounded as A ^ 00 (since pa ^ 0). Since Ee*^'^^^' = M^(t), 

equations (13.131) and (|3.14l) imply that M\{t) is also bounded as A — > 00. 

Denote Lp(t) = lim sup^i^oo ^a(^) < Let {Aj}^]^ be a sequence such that limj^oo Ma. (t) = 
(p{t). Construct a set {tj}j^^ dense in (0, t). Because M° (m) is monotone in u, one can obtain a 
subsequence {Aij} of {At} such that (u) converges as j ^ 00 for all < u < t. Since the 
limit is a moment generating function, by Theorem [TJi) it must be 1/(1 — t). The same argument 
can be applied to lim inf^^oo M^{t). 

It follows that the limit limyi_,oo ^a(^) exists and is equal to 1/(1 — t) for alH < 1. Because 
M^(t) is monotone in x and because of (13.1 Ik lim^^oo M^(t) necessarily equals l/{t— 1) for all 
t < 1 and every fixed x E [0, 00). This completes the proof of Theorem [ijii). 

4. Remarks 

1 . Let G be a distribution with support [0, A] and define the operator T as 

T{G) = the distribution of X(l) conditioned on {X(l) < A,X{0) ~ G}. 
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If T is a continuous operator (in the weak* topology on the distribution functions over [0, A]), then 
a quasi-stationary distribution exists, i.e., Condition C2 in Theorem[T]is satisfied (cf. Harris, 1963, 
Theorem III. 10.1). 

2. Even if T is not a continuous operator, sometimes Condition C2 can be verified by solving 
for T{G) = G and arguing that this is the quasi-stationary distribution. For an example, see PoUak 
(1985). 

3. The proof can be modified easily to extend Theorem [T] to the case where the support of the 
stationary distribution H is [c, oo) for some c > (i.e., the set [0, c) is not in the state space or is 
transient). 



5. Examples and Applications 

Theorem [T]can be applied to a number of popular Harris recurrent Markov processes. Below we 
present two examples. These are of interest when applying certain change-detection procedures. 

5.1. Example 1: An Additive-Multiplicative Markov Process 

Let Ai, A2, ... be non-negative continuous independent and identically distributed (i.i.d.) ran- 
dom variables with /3 = EAj and /i = E log Aj. For x > 0, define recursively: 

X(0)=x, X{n) = {l + X{n-l))An, n = l,2,.... (5.1) 

This process is of interest in a number of applications (cf. Kesten, 1973; PoUak, 1985, 1987). For 
example, in the problem of detecting a change in distribution, the Shiryaev-Roberts statistic can be 
written as (cf. PoUak, 1985, 1987) 

R(n) = {l + R{n-1))^^^ i?(0) = 0, (5.2) 

where {Yn,n > 1} are independent, having probability density /g,, before a change and putative 
density /g^ after a change; 6*0 and 9i are fixed parameters, and one stops and declares that the 
change is in effect at A^^ = min{n : R{n) > A}. 

When fi < 0, the process {X(n)} is Harris -recurrent and has a stationary distribution (for any 
X > 0). To see this, note that X(n) can be written as 



fc=0 i=k k=0 

where Aq = x. Obviously, 



n 

^exp <j ^logAj j> ^exp <j ^logAj j> +xexp <j ^logA^ j> , 

k=0 I i=k J k=l I 1=1 J I i=l J 

where the right hand-side converges (for every x > as n ^ 00) to the random variable 

00 ( ^ 

k=l I i=l 
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which is a.s. finite when < 0. Since we assumed above that Ai is continuous, the quasi-stationary 
distribution exists (see Remark 1 in Section S]). It follows from Theorem [T] that a suitably stan- 
dardized version of the first exceedance time over A (i.e., PaN'a) is asymptotically exponentially 
distributed. 

Note that while using the conventional regeneration argument is perhaps possible, embedding 
the Markov process (15.11 ) into "regenerative cycles" by no means is either straightforward or ob- 
vious, which is especially true when 1 < /3 = EAj < oo and ^ = ElogAj < 0. This case 
does have meaning for applications. For example, regard the aforementioned change detection 
problem. When there never is a change, the observations Yi,i > 1 have density /^g, so that P = 
I[fei{y)/f9o{y)]feo{y)dy = l while by Jensen's inequality /i = J \og[f0,{y)/ fg^{y)]fea{y)dy < 0. 
If there is a change - for argument's sake let it be in effect from the very beginning - the observa- 
tions Yi,i > 1 have density fo (not necessarily /e^; the post-change parameter is seldom known in 
advance, and the putative 9i is merely a representation of a "meaningful" change). For 9 close to 
60, one would obtain = j[feAy)/feo{y)]fe{y)dy > 1 and /i = j \og[fe^{y) / feSy)]fe{y)dy < 0. 

Before going into further details, we discuss an issue related to computing pa, the standardizing 
factor. If Pa were amenable to direct calculation, one could use this to approximate EA^^ ^ 1 /pA- 
Unfortunately, in most cases direct evaluation of pa is not tractable, and evaluation of 'EjN\ has to 
be done by other methods. (But see PoUak, 1985, and Mevorach and PoUak, 1991 for examples 
that allow some tractability.) Nonetheless, evaluation of pa is of interest on its own merits (cf. 
Tartakovsky, 2005), as pa is an approximation of the probability that there will be a first upcrossing 
of the threshold A at a specified time n, and 1 — (1 — p^)"^ is an approximation of the probability 
that there will be a first upcrossing of A in a given stretch of m observations (i.e., for the "local 
false alarm probability" P(?7, < A^^ <n + m — 1|A^1 > n)). Therefore, if EA^^ can be evaluated. 
Pa can be approximated by 1/EA^^. 

Suppose now that [3 = EAj = 1. Let fo be the density of Aj and define /i(A) = A/o(A). 
(Since EA = 1, it follows that /i is a bona fide probability density.) Note that A is a likelihood 
ratio, A = /i(A) //o(A). It follows from PoUak (1987) (see also Tartakovsky and Veeravalli, 2005) 
that 

Ef^N^ = 7-^(1 + 0(1)) as A ^ 00, (5.3) 

where E ^ is the expectation with respect to the density /q and 7 is a constant that can be calculated 
by renewal theory (cf. Woodroofe, 1982; Siegmund, 1985), so that pa ~ 7M- See Remark in the 
end of Section [5]2] for evaluation of pa when EAj 7^ 1. 

5.2. Example 2: A Reflected Random Walk 

Let {Zn}^^i be a sequence of i.i.d. continuous random variables with a negative mean /i = 
EZn < 0. For n > 1, define 

X{n) = max {0, X{n - 1) + Z„} , X(0) = x > 0. (5.4) 

Since /i < 0, the Markov process {X{n)} is Harris -recurrent and has a stationary distribution. To 
see this, note that 

X{n) = max {0, H \- + x, Z2 -\ h • • • , Z^} ■ 

Write Si = J2k=i^k, Sq = 0. Since the vector (Zi,...,Z„) has the same distribution as 
{Zn, Zi), it follows that 

X{n) max {max{0, 5*1, 5*2, ... , 5'„_i}, x + S'„} , 
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where the right hand-side converges (as n — oo for any x > 0) to the random variable maXj>o Si, 
which is a.s. finite whenever fi = EZi < 0. 

The process (15.41) describes a broad class of single-channel queuing systems (see, e.g., Borovkov, 
1976) as well as a popular cumulative sum decision statistic for detecting a change in distribution 
(Page, 1954) and has been studied extensively by itself, outside the framework of general Markov 
processes. For instance, for x = 0, the asymptotic exponentiality of the stopping time 

N^=^ = min {n > 1 : X{n) > a} , a > (5.5) 

(as a — > oo) has been proven by Khan (1995), which can be generalized easily for any x > 0. (The 
process {X(n)} obviously is a renewal process, so, although our Theorem [U covers this example 
when the conditions CI and C2 are satisfied, it is not needed to prove asymptotic exponentiality of 
N^, as it can be derived from general results; cf. Asmussen, 2003, Ch. VI.) 

Assume for simplicity that x = 0. If there exists a positive cu such that Ee'^'^' = 1, let fo{z) 
be the density of Zi and define fi{z) = e'^^ fo{z). Since Ee"^^' = 1, it follows that fi is a bona 
fide probability density, and fi{Z)/ fo{Z) = e"^^ is a likelihood ratio. Hence, assuming that 
fii = J \og[fi{z)/fo{z)]fi{z)dz < oo and letting 

A^^ = min {n> 1 : max (0, uX{n — 1) + uZn) > uja} , 

standard renewal-theoretic methods (cf. Woodroofe, 1982; Siegmund, 1985) readily apply to obtain 
that 

EN^ = r^e'""(l + o(l)) as a ^ oo, (5.6) 

so that Pa ~ 6e~'^'^. Here < 5 < 1 is a constant that can be computed explicitly by a renewal- 
theoretic argument (cf. Tartakovsky, 2005). 

Remark. Clearly, A*"^ of Example 2 is larger than of Example 1 (with A = e°), so that 
EN% < + o(l)). Theorem 5 of Kesten (1973) as well as Theorem 4 of Borovkov and 

Korshunov (1996) imply that 

P(X(oo)>y) = C/y-(l + o(l)) asy^oo, 

where X(oo) is a random variable that has the stationary distribution of {X(n)} and C is a positive 
finite constant. Note that X(oo) is stochastically larger than a random variable that has the quasi- 
stationary distribution. Therefore, the first upcrossing over A of the process X{n) starting at a 
random X(0) distributed like X(oo) will occur no later than the first upcrossing over A of the 
process X{n) starting at a random X{0) that has the quasi-stationary distribution. The proportion 
of times that the former exceeds A is P(X(oo) > A). It follows that EN% > C~^A'^{1 + o(l)), 
so that Pa has an order of magnitude l/A"^. 

5.3. Applications to Sequential Change-Point Detection and a Monte Carlo Experiment 

The importance of the asymptotic exponentiality of the run length in sequential change-point 
detection methods is twofold. First, it shows that the mean time to false alarm (the so-called 
average run length), which is a popular measure of the false alarm rate, is indeed an exhaustive 
performance metric. Second, the result can be used for the evaluation of the local false alarm prob- 
abilities of the corresponding detection schemes (see Example 1 above; see Tartakovsky (2005) 
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for a more detailed discussion of the importance of local false alarm probabilities in a variety of 
applications). 

To be more specific, assume that there is a sequence i.i.d. variables (observations) Yi, Y2, . . . 
that follow the density fo{y) under the no-change hypothesis (the in-control mode) and the den- 
sity fi{y) after the change occurs (the out-of-control mode). The change occurs at an unknown 
point in time z/; 1 < u < 00. Therefore, conditioned on z/ = /c, y„ ~ fo{y) for n < k and 

~ fi{y) for n > k. We write Pqo (Eqo) and (E^) respectively for probability measures 
(expectations) when there is no change (i.e., u = 00) and when the change occurs at point k. Let 
Zn = log[/i(F„)//o(F„)] be the corresponding log-likelihood ratio and let S'„ = XliLi ^i- Let 
Ji = Ei^i and Iq = Eoo(— be the KuUback-Leibler information numbers, which are assumed 
finite. 

We begin with the cumulative sum (CUSUM) test. The CUSUM statistic is given by the re- 
cursion (15.41 ) and the corresponding stopping time is defined in (15.51 ). The difference from the 
previous section is that Z„, n = 1,2,... are not arbitrary random variables with negative mean, 
but rather log-likelihood ratios with mean /i = —Iq. This simplifies most of the calculations, since 
Ee^" = 1. Recall that in this section we denote this expectation by Eqo. 

Rewrite the corresponding stopping time in the following form 

Na = niin {n>l: max{l,W{n - 1) + e^"} > A} , (5.7) 

where W{0) = 1 and A = e"-. The asymptotic approximation for the average run length to false 
alarm (15.61) holds with oj = 1, e"" = A, and 5 = /i7^ (cf. Tartakovsky, 2005), which implies that 
Pa ~ h'y'^/A. Here 7 = limy^oo Ei exp{ — {Sry — y)}, where Ty = mm{n : Sn > y} is the first 
time when the random walk Sn = X]r=i crosses the level y. The constant 7 is the subject of 
renewal theory (cf. Woodroofe, 1982 or Siegmund, 1985) and can be computed explicitly. 

We now proceed with the Shiryaev-Roberts detection test. The Shiryaev-Roberts statistic is 
defined by (15.21) . where '^f'^^^'l = e^" and R{0) = 0. The corresponding stopping time is 

Na = min {n > 1 : R{n) > A} . 

We now denote it by Na to distinguish from the CUSUM stopping time in the following calcula- 
tions and comparison. 

Since EooC^" = 1, the process R{n) — n is a zero-mean martingale, which allows us to approx- 
imate the average run length to false alarm: 

EooA^A ~ I'^A as A ^ 00. 

This approximation follows from (15.31) above. The distribution of the Shiryaev-Roberts stopping 
time is approximately Exponential (p^) with pA ~ j/A. (The asymptotic exponentiality of the 
suitably standardized run length to false alarm has been shown by Yakir, 1995.) 

In order to verify the accuracy of asymptotic approximations for reasonable values of the 
threshold A, we performed Monte Carlo (MC) simulations for the following example. Consider 
the case where observations are independent, originally having an Exponential(l) distribution, 
changing at an unknown time to Exp onential( 1/(1 + q)), i.e., 

/o(i/) = e-^%>o}, = Y^e-^/(i+^)%>o}, g > 0. (5.8) 
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In this case 

/i = g - log(l + g) and 7 = l/(l + g). 

Applying Example 1, the likelihood ratio is = e^" = (1 + g)^ie^^"/*^^"*"'^'' and the average run 
length (ARL) to false alarm of the Shiryaev-Roberts procedure is 

ARLs^,(A) = E^Na ~ (1 + q)A. (5.9) 

Applying Example 2, an approximation of the ARL to false alarm of the CUSUM test is 

ARLcf/(A) = E^Na ~ (5.10) 

q - log(l + q) 



Table 1: The ARL versus threshold for the CUSUM test for g = 3 



A 


1.2 


1.7 


2.5 


4.6 


9.2 


13.0 


17.1 


21 


41 


FO ARLcc/ 


11.90 


16.86 


24.79 


45.61 


91.22 


128.90 


169.55 


208.22 


406.52 


HO ARLcc/ 


7.96 


12.36 


19.69 


39.56 


84.07 


121.21 


161.43 


199.77 


397.02 


MC ARLcc/ 


8.04 


12.45 


19.79 


39.57 


84.33 


121.23 


161.88 


200.44 


397.16 


MC SB{Na) 


7.49 


11.88 


19.18 


38.61 


83.21 


119.73 


159.91 


198.97 


396.84 



Table 2: The ARL versus threshold for the Shiryaev-Roberts test for g = 3 



A 


1 


2 


5 


10 


20 


30 


40 


50 


100 


ARLsK 


4 


8 


20 


40 


80 


120 


160 


200 


400 


MC ARLsR 


4.01 


8.03 


20.00 


39.94 


79.99 


119.82 


159.17 


200.42 


399.46 


MC SD{Na) 


3.00 


6.78 


18.34 


37.92 


77.33 


117.39 


157.19 


197.90 


396.94 



We simulated the CUSUM and Shiryaev-Roberts procedures under the assumption of no change 
(i.e., all simulated observations are Exponential(l)). Each combination of (test,threshold) was 
simulated 100,000 times. The results are reported in Tables [T] and |2l We present the results of 
simulations when the parameter g = 3, which is a reasonable value in certain applications such as 
detection of a randomly appearing target in noisy measurements, in which case g is the signal-to- 
noise ratio (see, e.g., Tartakovsky, 1991 and Tartakovsky and Ivanova, 1992). It is seen that the 
approximation (15.91 ) for the Shiryaev-Roberts test is very accurate for all threshold values, even 
when the ARL is small. On the other hand, the approximation (15.101) for the CUSUM test (given 
in the row "FO ARLcc/" in Table [B where FO stands for "first order") is not especially accurate. 
This happens primarily because the first order approximation takes into account only the first term 
of expansion and ignores the second term 0{\ogA) as well as constants. An accurate, higher order 
(HO) approximation can be obtained using the results of Tartakovsky and Ivanova (1992) which 
give: 

(1 + g)^ 1 

ARLcc;(A) ^ \ , A - 77-—^ log A 

g-log(l + g) log(l + g) -g/(l + g) 

1 + g g 

g-log(l + g) (1 + g) log(l + g) - g' 
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In Table [H the row "HO ARLc^f" corresponds to this latter approximation, which perfectly fits 
the MC estimates (denoted by "MC ARLcc/") for all tested threshold values A > 1.2. 

In these tables we also present the MC estimates of standard deviations SD(A^yi) and SD(iV^) 
of the stopping times. As one would expect, the standard deviations are the same (approximately) 
as the means, and the similarity grows as A increases. The fit is slightly better for the CUSUM 
test. 



1 2 3 



7 8 9 10 11 



(a) CUSUM test: 9 = 3, A = 13 



(b) Shiryaev-Roberts test: q = i, A = 



40 



Figure 1: Empirical estimates of log[Poo(TA > y)] and log[Poo(TA > y)] for the CUSUM and 
Shiryaev-Roberts procedures 



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

Experimental Quantiles 




(a) CUSUM test: g = 3, A = 13 (b) Shiryaev-Roberts test: g = 3, A = 40 

Figure 2: QQ-plots for the stopping times of the CUSUM and Shiryaev-Roberts procedures 



Figures |l(a)| and |l(b)| show the logarithm of the empirical (MC estimates) survival functions 
log Poo (ta > y) and logPoo('rA > y) for the CUSUM and Shiryaev-Roberts procedures, where 
ta = A^yi/ARLcu and = iV^/ARLsR are the corresponding standardized stopping times, 
along with the logarithm of the exponential probability plot loge^^ = —y. The quantile-quantile 
plots (QQ-plots) for the stopping times are shown in Figures [2(a)] and [2(b)j The QQ-plots display 
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sample quantiles of A^i and Na versus theoretical quantiles from the exponential distribution. If the 

distributions of the stopping times are exponential, the plots will be close to linear These figures 
show that, for the chosen putative value of the post-change parameter (g = 3), the exponential 
distribution approximates the distributions of the stopping times very well. It is seen that the 
exponential approximation works very well already for A — 1?, (ARLcc/ ~ 120) for the CUSUM 
test and for A — AQ (ARLsi? ~ 160) for the Shiryaev-Roberts test. When considering that in 
practical applications the values of the ARL to false alarm usually range from 300 and upwards, 
the exponential distribution seems to be a perfect fit. 
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